Lab 6 - Logistic Regression¶

Goal :Our objective is to categorize Iris flowers into two classes based on their dimensional attributes. The classification is binary, there are two classes: virginica and non-virginica.

Importing necessary Modules¶

In [35]:
# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import warnings
from sklearn.model_selection import cross_val_score
import numpy as np
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from matplotlib.colors import ListedColormap
import plotly.graph_objs as go
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder
import plotly
plotly.offline.init_notebook_mode()
import numpy as np
import pandas as pd
from mlxtend.plotting import plot_decision_regions

Loading Iris Dataset¶

In [36]:
from sklearn.datasets import load_iris

iris_dataset = load_iris(as_frame=True)

iris_dataset.data.head()
Out[36]:
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2

Checking Virginica as true from data set other are false¶

In [37]:
y = iris_dataset.target_names[iris_dataset.target] == 'virginica'
y
Out[37]:
array([False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True])

Defining value = 2 is virginica and other value such as 0 and 1 to non virginica¶

In [38]:
for i in range(0, len(iris_dataset.target)):
    if iris_dataset.target[i] == 2:
        iris_dataset.target[i] = 'virginica'
    else:
        iris_dataset.target[i] = 'non-virginica'
In [39]:
iris_dataset
Out[39]:
{'data':      sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
 0                  5.1               3.5                1.4               0.2
 1                  4.9               3.0                1.4               0.2
 2                  4.7               3.2                1.3               0.2
 3                  4.6               3.1                1.5               0.2
 4                  5.0               3.6                1.4               0.2
 ..                 ...               ...                ...               ...
 145                6.7               3.0                5.2               2.3
 146                6.3               2.5                5.0               1.9
 147                6.5               3.0                5.2               2.0
 148                6.2               3.4                5.4               2.3
 149                5.9               3.0                5.1               1.8
 
 [150 rows x 4 columns],
 'target': 0      non-virginica
 1      non-virginica
 2      non-virginica
 3      non-virginica
 4      non-virginica
            ...      
 145        virginica
 146        virginica
 147        virginica
 148        virginica
 149        virginica
 Name: target, Length: 150, dtype: object,
 'frame':      sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)   
 0                  5.1               3.5                1.4               0.2  \
 1                  4.9               3.0                1.4               0.2   
 2                  4.7               3.2                1.3               0.2   
 3                  4.6               3.1                1.5               0.2   
 4                  5.0               3.6                1.4               0.2   
 ..                 ...               ...                ...               ...   
 145                6.7               3.0                5.2               2.3   
 146                6.3               2.5                5.0               1.9   
 147                6.5               3.0                5.2               2.0   
 148                6.2               3.4                5.4               2.3   
 149                5.9               3.0                5.1               1.8   
 
      target  
 0         0  
 1         0  
 2         0  
 3         0  
 4         0  
 ..      ...  
 145       2  
 146       2  
 147       2  
 148       2  
 149       2  
 
 [150 rows x 5 columns],
 'target_names': array(['setosa', 'versicolor', 'virginica'], dtype='<U10'),
 'DESCR': '.. _iris_dataset:\n\nIris plants dataset\n--------------------\n\n**Data Set Characteristics:**\n\n    :Number of Instances: 150 (50 in each of three classes)\n    :Number of Attributes: 4 numeric, predictive attributes and the class\n    :Attribute Information:\n        - sepal length in cm\n        - sepal width in cm\n        - petal length in cm\n        - petal width in cm\n        - class:\n                - Iris-Setosa\n                - Iris-Versicolour\n                - Iris-Virginica\n                \n    :Summary Statistics:\n\n    ============== ==== ==== ======= ===== ====================\n                    Min  Max   Mean    SD   Class Correlation\n    ============== ==== ==== ======= ===== ====================\n    sepal length:   4.3  7.9   5.84   0.83    0.7826\n    sepal width:    2.0  4.4   3.05   0.43   -0.4194\n    petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)\n    petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)\n    ============== ==== ==== ======= ===== ====================\n\n    :Missing Attribute Values: None\n    :Class Distribution: 33.3% for each of 3 classes.\n    :Creator: R.A. Fisher\n    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)\n    :Date: July, 1988\n\nThe famous Iris database, first used by Sir R.A. Fisher. The dataset is taken\nfrom Fisher\'s paper. Note that it\'s the same as in R, but not as in the UCI\nMachine Learning Repository, which has two wrong data points.\n\nThis is perhaps the best known database to be found in the\npattern recognition literature.  Fisher\'s paper is a classic in the field and\nis referenced frequently to this day.  (See Duda & Hart, for example.)  The\ndata set contains 3 classes of 50 instances each, where each class refers to a\ntype of iris plant.  One class is linearly separable from the other 2; the\nlatter are NOT linearly separable from each other.\n\n.. topic:: References\n\n   - Fisher, R.A. "The use of multiple measurements in taxonomic problems"\n     Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to\n     Mathematical Statistics" (John Wiley, NY, 1950).\n   - Duda, R.O., & Hart, P.E. (1973) Pattern Classification and Scene Analysis.\n     (Q327.D83) John Wiley & Sons.  ISBN 0-471-22361-1.  See page 218.\n   - Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System\n     Structure and Classification Rule for Recognition in Partially Exposed\n     Environments".  IEEE Transactions on Pattern Analysis and Machine\n     Intelligence, Vol. PAMI-2, No. 1, 67-71.\n   - Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule".  IEEE Transactions\n     on Information Theory, May 1972, 431-433.\n   - See also: 1988 MLC Proceedings, 54-64.  Cheeseman et al"s AUTOCLASS II\n     conceptual clustering system finds 3 classes in the data.\n   - Many, many more ...',
 'feature_names': ['sepal length (cm)',
  'sepal width (cm)',
  'petal length (cm)',
  'petal width (cm)'],
 'filename': 'iris.csv',
 'data_module': 'sklearn.datasets.data'}

Description of data set dataset¶

In [40]:
iris_dataset.DESCR
print(iris_dataset.DESCR)
.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive attributes and the class
    :Attribute Information:
        - sepal length in cm
        - sepal width in cm
        - petal length in cm
        - petal width in cm
        - class:
                - Iris-Setosa
                - Iris-Versicolour
                - Iris-Virginica
                
    :Summary Statistics:

    ============== ==== ==== ======= ===== ====================
                    Min  Max   Mean    SD   Class Correlation
    ============== ==== ==== ======= ===== ====================
    sepal length:   4.3  7.9   5.84   0.83    0.7826
    sepal width:    2.0  4.4   3.05   0.43   -0.4194
    petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
    petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)
    ============== ==== ==== ======= ===== ====================

    :Missing Attribute Values: None
    :Class Distribution: 33.3% for each of 3 classes.
    :Creator: R.A. Fisher
    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
    :Date: July, 1988

The famous Iris database, first used by Sir R.A. Fisher. The dataset is taken
from Fisher's paper. Note that it's the same as in R, but not as in the UCI
Machine Learning Repository, which has two wrong data points.

This is perhaps the best known database to be found in the
pattern recognition literature.  Fisher's paper is a classic in the field and
is referenced frequently to this day.  (See Duda & Hart, for example.)  The
data set contains 3 classes of 50 instances each, where each class refers to a
type of iris plant.  One class is linearly separable from the other 2; the
latter are NOT linearly separable from each other.

.. topic:: References

   - Fisher, R.A. "The use of multiple measurements in taxonomic problems"
     Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to
     Mathematical Statistics" (John Wiley, NY, 1950).
   - Duda, R.O., & Hart, P.E. (1973) Pattern Classification and Scene Analysis.
     (Q327.D83) John Wiley & Sons.  ISBN 0-471-22361-1.  See page 218.
   - Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System
     Structure and Classification Rule for Recognition in Partially Exposed
     Environments".  IEEE Transactions on Pattern Analysis and Machine
     Intelligence, Vol. PAMI-2, No. 1, 67-71.
   - Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule".  IEEE Transactions
     on Information Theory, May 1972, 431-433.
   - See also: 1988 MLC Proceedings, 54-64.  Cheeseman et al"s AUTOCLASS II
     conceptual clustering system finds 3 classes in the data.
   - Many, many more ...

The Iris dataset is a well-known dataset used in pattern recognition research, famously introduced by Sir R.A. Fisher. It comprises 150 instances, with each representing a different iris plant. There are four numeric attributes measured in centimeters: sepal length, sepal width, petal length, and petal width. The dataset includes three classes of iris plants: Iris-Setosa, Iris-Versicolour, and Iris-Virginica, each containing 50 instances. Notably, while one class is linearly separable from the other two, the latter two are not linearly separable from each other. The dataset's summary statistics provide insights into the range, mean, and standard deviation for each attribute, along with class correlations. It's widely used in pattern classification research and serves as a fundamental dataset in the field.

It includes three iris species with 50 samples each as well as some properties about each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other.

The columns in this dataset are:

ID

SepalLengthCm

SepalWidthCm

PetalLengthCm

PetalWidthCm

Tagrget

In [41]:
from sklearn.datasets import load_iris
import pandas as pd

# Load the Iris dataset
iris = load_iris(as_frame=True)

# Convert data and target attributes to DataFrame
iris_dataset = pd.concat([iris.data, iris.target], axis=1)
iris_dataset.columns = iris.feature_names + ['target']

# Replace numerical target values with class names
iris_dataset['target'] = iris.target_names[iris_dataset['target']]

# Display the DataFrame
print(iris_dataset)
     sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)   
0                  5.1               3.5                1.4               0.2  \
1                  4.9               3.0                1.4               0.2   
2                  4.7               3.2                1.3               0.2   
3                  4.6               3.1                1.5               0.2   
4                  5.0               3.6                1.4               0.2   
..                 ...               ...                ...               ...   
145                6.7               3.0                5.2               2.3   
146                6.3               2.5                5.0               1.9   
147                6.5               3.0                5.2               2.0   
148                6.2               3.4                5.4               2.3   
149                5.9               3.0                5.1               1.8   

        target  
0       setosa  
1       setosa  
2       setosa  
3       setosa  
4       setosa  
..         ...  
145  virginica  
146  virginica  
147  virginica  
148  virginica  
149  virginica  

[150 rows x 5 columns]
In [42]:
iris_dataset["target"]
Out[42]:
0         setosa
1         setosa
2         setosa
3         setosa
4         setosa
         ...    
145    virginica
146    virginica
147    virginica
148    virginica
149    virginica
Name: target, Length: 150, dtype: object
In [43]:
for i in range(0, len(iris_dataset["target"])):
    if iris_dataset["target"][i] == 'virginica':
        iris_dataset.loc[i, "target"] = 'virginica'
    else:
        iris_dataset.loc[i, "target"] = 'non-virginica'
In [44]:
# Filter data for virginica and non-virginica groups
virginica_category = iris_dataset[iris_dataset.target == 'virginica']
non_virginica_category = iris_dataset[iris_dataset.target == 'non-virginica']
In [45]:
virginica_category.describe()
Out[45]:
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
count 50.00000 50.000000 50.000000 50.00000
mean 6.58800 2.974000 5.552000 2.02600
std 0.63588 0.322497 0.551895 0.27465
min 4.90000 2.200000 4.500000 1.40000
25% 6.22500 2.800000 5.100000 1.80000
50% 6.50000 3.000000 5.550000 2.00000
75% 6.90000 3.175000 5.875000 2.30000
max 7.90000 3.800000 6.900000 2.50000

Count: This indicates the number of data points in the dataset for each feature. there are 50 data points for each feature.

Mean: This represents the average value of each feature across all data points. For example, the mean sepal length is 6.588 cm, the mean sepal width is 2.974 cm, the mean petal length is 5.552 cm, and the mean petal width is 2.026 cm.

Std (Standard Deviation): This measures the dispersion or spread of the values around the mean. A higher standard deviation indicates greater variability in the data. For instance, the standard deviation of sepal length is 0.63588 cm.

Min: This shows the minimum value observed for each feature. For instance, the minimum sepal length observed is 4.9 cm.

Max: This indicates the maximum value observed for each feature. For example, the maximum sepal length observed is 7.9 cm.

25%, 50%, and 75%: These represent the quartiles of the data distribution. The 25th percentile (Q1) indicates the value below which 25% of the data falls, the 50th percentile (Q2 or median) represents the value below which 50% of the data falls, and the 75th percentile (Q3) indicates the value below which 75% of the data falls.

In [46]:
virginica_category.head()
Out[46]:
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target
100 6.3 3.3 6.0 2.5 virginica
101 5.8 2.7 5.1 1.9 virginica
102 7.1 3.0 5.9 2.1 virginica
103 6.3 2.9 5.6 1.8 virginica
104 6.5 3.0 5.8 2.2 virginica
In [47]:
non_virginica_category.describe()
Out[47]:
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
count 100.000000 100.000000 100.000000 100.000000
mean 5.471000 3.099000 2.861000 0.786000
std 0.641698 0.478739 1.449549 0.565153
min 4.300000 2.000000 1.000000 0.100000
25% 5.000000 2.800000 1.500000 0.200000
50% 5.400000 3.050000 2.450000 0.800000
75% 5.900000 3.400000 4.325000 1.300000
max 7.000000 4.400000 5.100000 1.800000
In [48]:
non_virginica_category.head()
Out[48]:
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target
0 5.1 3.5 1.4 0.2 non-virginica
1 4.9 3.0 1.4 0.2 non-virginica
2 4.7 3.2 1.3 0.2 non-virginica
3 4.6 3.1 1.5 0.2 non-virginica
4 5.0 3.6 1.4 0.2 non-virginica

Plotting Histogram of feature for each class.¶

In [49]:
# Define a custom color palette
custom_palette = {"non-virginica": "#FF5733", "virginica": "#33FF57"}

# Set the style
sns.set(style="whitegrid")

# Plot histograms for each feature, separated by class
for feature in iris_dataset.columns[:-1]: 
    plt.figure(figsize=(8, 6))
    sns.histplot(data=iris_dataset, x=feature, hue="target", hue_order=['non-virginica', 'virginica'],
                 palette=custom_palette, kde=True, legend=True)
    plt.title(f"Histogram of {feature} for each class")
    plt.xlabel(feature)
    plt.ylabel("Frequency")
    plt.show()

Plotting Correlation Matrix of Features for the "Virginica" Group¶

In [50]:
# Select data for the 'virginica' group
virginica_df = iris_dataset[iris_dataset['target'] == 'virginica']

# Calculate the correlation matrix
correlation_matrix = iris_dataset.iloc[:, :-1].corr()

# Set up the matplotlib figure
plt.figure(figsize=(8, 6))

# Draw the heatmap
sns.heatmap(correlation_matrix, annot=True, cmap='viridis', linewidths=0.5)

plt.title('Correlation Matrix of Features', fontsize=14)
plt.xlabel('Features', fontsize=12)
plt.ylabel('Features', fontsize=12)
plt.xticks(fontsize=10)
plt.yticks(fontsize=10)
plt.show()

Plotting Correlation Matrix of Features for the "Virginica" and "Non-Virginica" Group¶

In [51]:
# Generate correlation matrices for both groups
corr_matrix_virginica = virginica_category.iloc[:, :-1].corr()
corr_matrix_non_virginica = non_virginica_category.iloc[:, :-1].corr()

# Set up the figure and axes
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Draw heatmap for the "virginica" group
sns.heatmap(corr_matrix_virginica, annot=True, cmap='magma', linewidths=.5, ax=axes[0])
axes[0].set_title('Correlation Matrix of Features for the "Virginica" Group')

# Draw heatmap for the "non-virginica" group
sns.heatmap(corr_matrix_non_virginica, annot=True, cmap='magma', linewidths=.5, ax=axes[1])
axes[1].set_title('Correlation Matrix of Features for the "Non-Virginica" Group')

plt.tight_layout()
plt.show()

3 plots from Kagle¶

In [52]:
# Create the boxplot with red and yellow boxes
ax = sns.boxplot(x="target", y="petal length (cm)", data=iris_dataset, palette={"virginica": "red", "non-virginica": "yellow"})

# Create the stripplot with black dots
sns.stripplot(x="target", y="petal length (cm)", data=iris_dataset, jitter=True, edgecolor="black", ax=ax)

# Show the plot
plt.show()

Here is the link of reference : kaggle

Graph Description:

  • scatter plot provide a summary of the distribution of each numeric feature for each species.

They show the median, quartiles, and potential outliers.

  • From the scatter plot, we can see that some features (e.g., petal length and petal width) have significant differences in their distributions among the three species, making them valuable for classification.
  • This scatter plot represents the petal length (in centimeters) of two categories of flowers: non-virginica and virginica.
  • Each blue dot on the graph corresponds to an individual measurement of petal length for a flower.

The background is divided into two regions:

  • Yellow Background: Indicates the typical range of petal lengths for non-virginica flowers.
  • Red Background: Represents the common range for virginica flowers’ petal length.

Key Observations:

Non-virginica Flowers:

  • Most data points for non-virginica flowers fall between 1 to 5 centimeters in petal length.
  • The yellow region captures this range.

Virginica Flowers:

  • Virginica flowers tend to have longer petal lengths.
  • Their data points cluster primarily between 4 to 7 centimeters.
  • The red region corresponds to this range.

Overlap:

  • There is some overlap in petal lengths around 4 to 5 centimeters.
  • In this range, it might be challenging to distinguish between the two categories based solely on petal length.

Conclusion: Petal length can serve as a useful feature for classifying flowers as either non-virginica or virginica. However, other features or a combination of attributes may be necessary for more accurate classification, especially in the overlapping range.

In [53]:
# Filter the dataset for each species
virginica_data = iris_dataset[iris_dataset['target'] == 'virginica']
non_virginica_data = iris_dataset[iris_dataset['target'] != 'virginica']

# Set up the figure and axes
fig, axes = plt.subplots(2, 1, figsize=(10, 8))

# Plot histograms for virginica
virginica_data.plot(kind='hist', bins=50, range=(0, 8), alpha=0.3, ax=axes[0])
axes[0].set_title('Virginica Data Set')
axes[0].set_xlabel('[cm]')

# Plot histograms for non-virginica
non_virginica_data.plot(kind='hist', bins=50, range=(0, 8), alpha=0.3, ax=axes[1])
axes[1].set_title('Non-Virginica Data Set')
axes[1].set_xlabel('[cm]')

plt.tight_layout()
plt.show()

Here is the link of reference : kaggle

Graph Description:

Reason for choosing :

  • this graph provides insights into the relationship between petal length and flower categories, but additional context and features would enhance our understanding of flower classification
  • This scatter plot represents the petal length (in centimeters) of two categories of flowers: non-virginica and virginica.
  • Each blue dot on the graph corresponds to an individual measurement of petal length for a flower.

The background is divided into two regions:

  • Yellow Background: Indicates the typical range of petal lengths for non-virginica flowers.
  • Red Background: Represents the common range for virginica flowers’ petal length.

Key Observations:

Non-virginica Flowers:

  • Most data points for non-virginica flowers fall between 1 to 5 centimeters in petal length.
  • The yellow region captures this range.

Virginica Flowers:

  • Virginica flowers tend to have longer petal lengths.
  • Their data points cluster primarily between 4 to 7 centimeters.
  • The red region corresponds to this range.

Conclusion:

  • Petal length can serve as a useful feature for classifying flowers as either non-virginica or virginica.
  • However, other features or a combination of attributes may be necessary for more accurate classification, especially in the overlapping range.
In [54]:
fig = px.scatter_3d(iris_dataset, x="sepal width (cm)", y="petal length (cm)", z='petal width (cm)',
              color='target')
fig.show()

Here is the link of reference : kaggle

3d Scatter Plot Overview:

  • this 3d scatter plot provides valuable insights into the relationship between petal dimensions and flower classification.
  • The graph displays data points for two types of flowers: “virginica” (red points) and “non-virginica” (blue points).
  • The x-axis represents petal length (in centimeters), ranging approximately from 1 to 7 cm.
  • The y-axis represents petal width (in centimeters), ranging approximately from 0.5 to 2.5 cm.

Interpretation:

Blue Points (non-virginica): These points cluster towards the lower end of both axes.Non-virginica flowers generally have smaller petal lengths (around 1-2 cm) and narrower petal widths (around 0.5-1 cm).

Red Points (virginica): These points are more spread out across the graph.Virginica flowers exhibit greater variability: Their petal lengths mostly fall between 4-7 cm.Their petal widths vary between 1-2.5 cm.

Conclusion

  • The 3D scatter plot shows two types of flowers: "virginica" (in red) and "non-virginica" (in blue). We can see that virginica flowers generally have larger petals than non-virginica ones, which helps us tell them apart. The plot highlights that petal size, specifically length and width, is crucial for distinguishing between these flower types. This insight can be useful for botanists and gardeners to classify and identify different flower species based on their petal characteristics. Overall, the plot gives us a clear picture of how petal dimensions relate to flower classification.

Split the data to a train set (120 records), a validation set (15 records) and a test set (15 records).¶

In [55]:
# Split the data into training, validation, and test sets
X = iris_dataset.iloc[:, :-1]
X_train, X_temp, y_train, y_temp = train_test_split(iris_dataset.iloc[:, :-1], iris_dataset['target'], test_size=0.2, random_state=42)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42)

# Print the shapes of the resulting sets
print("Shape of the Training Set:", X_train.shape, y_train.shape)
print("Shape of the Validation Set:", X_val.shape, y_val.shape)
print("Shape of the Test Set:", X_test.shape, y_test.shape)
Shape of the Training Set: (120, 4) (120,)
Shape of the Validation Set: (15, 4) (15,)
Shape of the Test Set: (15, 4) (15,)

Run four logistic regression models models, with 1,2,3 and 4 features¶

In [56]:
# Specify the number of features to consider
feature_number = [1, 2, 3, 4]

# Feature names
feature_names = iris.feature_names

# Iterate over different numbers of features
for count in feature_number:
    # Select the first 'count' features from the training, validation, and test sets
    X_train_selected = X_train.iloc[:, :count]
    X_val_selected = X_val.iloc[:, :count]
    X_test_selected = X_test.iloc[:, :count]
    
    # Initialize and train a logistic regression model
    model = LogisticRegression(random_state=42)
    model.fit(X_train_selected, y_train)
    
    # Predict labels for the validation set
    y_pred = model.predict(X_val_selected)
    
    # Calculate accuracy of the model
    accuracy = accuracy_score(y_val, y_pred)
    
    # Display accuracy for the current feature count
    print("Number of features considered:", count)
    print("Features:", ', '.join(feature_names[:count]))
    print("Validation Accuracy:", accuracy)
    print()  # Empty line for readability

    del y_pred  
    
Number of features considered: 1
Features: sepal length (cm)
Validation Accuracy: 0.9333333333333333

Number of features considered: 2
Features: sepal length (cm), sepal width (cm)
Validation Accuracy: 0.9333333333333333

Number of features considered: 3
Features: sepal length (cm), sepal width (cm), petal length (cm)
Validation Accuracy: 1.0

Number of features considered: 4
Features: sepal length (cm), sepal width (cm), petal length (cm), petal width (cm)
Validation Accuracy: 1.0

In [57]:
y_test_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_test_pred)

Using LogisticRegression to make 4 model with different features¶

In [58]:
# Suppress warnings
warnings.filterwarnings("ignore")

# Define a function to train and evaluate logistic regression models with different numbers of features
def train_and_evaluate(X_train, X_val, X_test, y_train, y_val, y_test, num_features):
    # Select the first 'num_features' columns for training
    X_train_subset = X_train.iloc[:, :num_features]
    X_val_subset = X_val.iloc[:, :num_features]
    X_test_subset = X_test.iloc[:, :num_features]

    # Initialize and train logistic regression model
    model = LogisticRegression()
    model.fit(X_train_subset, y_train)

    # Make predictions on validation set
    y_pred_val = model.predict(X_val_subset)

    # Calculate accuracy on validation set
    accuracy_val = accuracy_score(y_val, y_pred_val)
    
    # Make predictions on test set
    y_pred_test = model.predict(X_test_subset)

    # Calculate accuracy on test set
    accuracy_test = accuracy_score(y_test, y_pred_test)

    del y_pred_val
    del y_pred_test
    del model
    
    return accuracy_val, accuracy_test

# Train and evaluate logistic regression models with different numbers of features
feature_names = iris.feature_names
for num_features in range(1, 5):
    # Call the function to train and evaluate the model
    accuracy_val, accuracy_test = train_and_evaluate(X_train, X_val, X_test, y_train, y_val, y_test, num_features)
    
    print("Model with {} feature(s):".format(num_features))
    print("Features:", ', '.join(feature_names[:num_features]))
    print("Validation accuracy:", accuracy_val)
    print("Test accuracy:", accuracy_test)
    print()
Model with 1 feature(s):
Features: sepal length (cm)
Validation accuracy: 0.9333333333333333
Test accuracy: 0.9333333333333333

Model with 2 feature(s):
Features: sepal length (cm), sepal width (cm)
Validation accuracy: 0.9333333333333333
Test accuracy: 0.8666666666666667

Model with 3 feature(s):
Features: sepal length (cm), sepal width (cm), petal length (cm)
Validation accuracy: 1.0
Test accuracy: 1.0

Model with 4 feature(s):
Features: sepal length (cm), sepal width (cm), petal length (cm), petal width (cm)
Validation accuracy: 1.0
Test accuracy: 1.0

Additional work :- Mean Cross-Validation Accuracy of all the 4 model¶

In [59]:
# Define a function to perform cross-validation on logistic regression models with varying feature subsets
def perform_cross_validation(X, y, num_features):
    # Select the first 'num_features' columns for training
    X_subset = X.iloc[:, :num_features]

    # Initialize the logistic regression model
    model = LogisticRegression()

    # Perform 5-fold cross-validation
    cross_val_scores = cross_val_score(model, X_subset, y, cv=5)

    return cross_val_scores.mean()

# Iterate through different numbers of features and perform cross-validation
for num_features in range(1, 5):
    mean_cross_val_accuracy = perform_cross_validation(pd.concat([X_train, X_val]), pd.concat([y_train, y_val]), num_features)
    print(f"Model with {num_features} feature(s) - Mean Cross-Validation Accuracy: {mean_cross_val_accuracy}")
Model with 1 feature(s) - Mean Cross-Validation Accuracy: 0.8074074074074075
Model with 2 feature(s) - Mean Cross-Validation Accuracy: 0.7925925925925925
Model with 3 feature(s) - Mean Cross-Validation Accuracy: 0.9555555555555555
Model with 4 feature(s) - Mean Cross-Validation Accuracy: 0.962962962962963

Table icluding 4 feature : Instance Number | Probability of Virginica | Prediction | Ground Truth¶

In [60]:
from tabulate import tabulate

# Define a function to create the evaluation table for a given model
def create_evaluation_table(model, X_val, y_val):
    # Make predictions and probabilities
    y_pred = model.predict(X_val)
    y_prob = model.predict_proba(X_val)
    
    # Get the occurrence number from the original dataset
    occurrence_number = X_val.index + 1
    
    # Create the evaluation table
    evaluation_table = pd.DataFrame({
        'Instance Number': occurrence_number,
        'Probability of Virginica': y_prob[:, 1], 
        'Prediction': y_pred,
        'Ground Truth': y_val
    })
    
    del y_prob
    del y_pred
    return evaluation_table.set_index('Instance Number')

# Define a function to print the evaluation table for a given model
def print_evaluation_table(model_name, eval_table):
    print(f"Evaluation Table for Model: {model_name}")
    print(tabulate(eval_table, headers='keys', tablefmt='psql'))
    print()

# Iterate through each model and print its evaluation table
def individual_table(num_features):
    # Train the model
    model = LogisticRegression()
    model.fit(X_train.iloc[:, :num_features], y_train)
    
    # Create and print the evaluation table
    eval_table = create_evaluation_table(model, X_val.iloc[:, :num_features], y_val)
    print_evaluation_table(f"Model with {num_features} feature(s)", eval_table)

    del model
    return eval_table

def calculate_accuracy(eval_table):
   
    correct_predictions = (eval_table['Prediction'] == eval_table['Ground Truth']).sum()
    
   
    total_instances = len(eval_table)
    
   
    accuracy = (correct_predictions / total_instances) * 100
    
    print(f"{accuracy:.2f}% is the prediction accuracy.")

predict_table = individual_table(1)
calculate_accuracy(predict_table)
Evaluation Table for Model: Model with 1 feature(s)
+-------------------+----------------------------+---------------+----------------+
|   Instance Number |   Probability of Virginica | Prediction    | Ground Truth   |
|-------------------+----------------------------+---------------+----------------|
|                27 |                  0.06451   | non-virginica | non-virginica  |
|                19 |                  0.217912  | non-virginica | non-virginica  |
|               119 |                  0.937717  | virginica     | virginica      |
|               146 |                  0.671933  | virginica     | virginica      |
|                79 |                  0.336388  | non-virginica | non-virginica  |
|               128 |                  0.382264  | non-virginica | virginica      |
|               109 |                  0.671933  | virginica     | virginica      |
|                56 |                  0.217912  | non-virginica | non-virginica  |
|                31 |                  0.0442258 | non-virginica | non-virginica  |
|                30 |                  0.0365199 | non-virginica | non-virginica  |
|               142 |                  0.753228  | virginica     | virginica      |
|               111 |                  0.578831  | virginica     | virginica      |
|                20 |                  0.0776461 | non-virginica | non-virginica  |
|               133 |                  0.529589  | virginica     | virginica      |
|                65 |                  0.185827  | non-virginica | non-virginica  |
+-------------------+----------------------------+---------------+----------------+

93.33% is the prediction accuracy.
In [61]:
from tabulate import tabulate

# Define a function to create the evaluation table for a given model
def create_evaluation_table(model, X_val, y_val):
    # Make predictions and probabilities
    y_pred = model.predict(X_val)
    y_pred_proba = model.predict_proba(X_val)
    
    # Get the occurrence number from the original dataset
    occurrence_number = X_val.index + 1
    
    # Create the evaluation table
    table_eve = pd.DataFrame({
        'Instance Number': occurrence_number,
        'Probability of Virginica': y_pred_proba[:, 1], 
        'Prediction': y_pred,
        'Ground Truth': y_val
    })

    del y_pred
    del y_pred_proba
    
    return table_eve.set_index('Instance Number')

# Define a function to print the evaluation table for a given model
def print_evaluation_table(model_name, table_eve):
    print(f"Evaluation Table for Model: {model_name}")
    print(tabulate(table_eve, headers='keys', tablefmt='psql'))
    print()

# Iterate through each model and print its evaluation table
def individual_table(num_features):
    # Train the model
    model = LogisticRegression()
    model.fit(X_train.iloc[:, :num_features], y_train)
    
    # Create and print the evaluation table
    evaluation_table = create_evaluation_table(model, X_val.iloc[:, :num_features], y_val)
    print_evaluation_table(f"Model with {num_features} feature(s)", evaluation_table)

    del model

    return evaluation_table

def calculate_accuracy(table_eve):
        accu_predictions = (table_eve['Prediction'] == table_eve['Ground Truth']).sum()
    
        instances_all = len(table_eve)
    
        accuracy = (accu_predictions / instances_all) * 100
    
        print(f"{accuracy:.2f}% is the prediction accuracy.")
predict_table = individual_table(1)
calculate_accuracy(predict_table)
Evaluation Table for Model: Model with 1 feature(s)
+-------------------+----------------------------+---------------+----------------+
|   Instance Number |   Probability of Virginica | Prediction    | Ground Truth   |
|-------------------+----------------------------+---------------+----------------|
|                27 |                  0.06451   | non-virginica | non-virginica  |
|                19 |                  0.217912  | non-virginica | non-virginica  |
|               119 |                  0.937717  | virginica     | virginica      |
|               146 |                  0.671933  | virginica     | virginica      |
|                79 |                  0.336388  | non-virginica | non-virginica  |
|               128 |                  0.382264  | non-virginica | virginica      |
|               109 |                  0.671933  | virginica     | virginica      |
|                56 |                  0.217912  | non-virginica | non-virginica  |
|                31 |                  0.0442258 | non-virginica | non-virginica  |
|                30 |                  0.0365199 | non-virginica | non-virginica  |
|               142 |                  0.753228  | virginica     | virginica      |
|               111 |                  0.578831  | virginica     | virginica      |
|                20 |                  0.0776461 | non-virginica | non-virginica  |
|               133 |                  0.529589  | virginica     | virginica      |
|                65 |                  0.185827  | non-virginica | non-virginica  |
+-------------------+----------------------------+---------------+----------------+

93.33% is the prediction accuracy.
In [62]:
predict_table = individual_table(2)
calculate_accuracy(predict_table)
Evaluation Table for Model: Model with 2 feature(s)
+-------------------+----------------------------+---------------+----------------+
|   Instance Number |   Probability of Virginica | Prediction    | Ground Truth   |
|-------------------+----------------------------+---------------+----------------|
|                27 |                  0.050796  | non-virginica | non-virginica  |
|                19 |                  0.145047  | non-virginica | non-virginica  |
|               119 |                  0.949865  | virginica     | virginica      |
|               146 |                  0.66928   | virginica     | virginica      |
|                79 |                  0.347378  | non-virginica | non-virginica  |
|               128 |                  0.379491  | non-virginica | virginica      |
|               109 |                  0.732581  | virginica     | virginica      |
|                56 |                  0.237155  | non-virginica | non-virginica  |
|                31 |                  0.0412891 | non-virginica | non-virginica  |
|                30 |                  0.0321397 | non-virginica | non-virginica  |
|               142 |                  0.739476  | virginica     | virginica      |
|               111 |                  0.546099  | virginica     | virginica      |
|                20 |                  0.0487707 | non-virginica | non-virginica  |
|               133 |                  0.556691  | virginica     | virginica      |
|                65 |                  0.193357  | non-virginica | non-virginica  |
+-------------------+----------------------------+---------------+----------------+

93.33% is the prediction accuracy.
In [63]:
predict_table = individual_table(3)
calculate_accuracy(predict_table)
Evaluation Table for Model: Model with 3 feature(s)
+-------------------+----------------------------+---------------+----------------+
|   Instance Number |   Probability of Virginica | Prediction    | Ground Truth   |
|-------------------+----------------------------+---------------+----------------|
|                27 |                1.57225e-05 | non-virginica | non-virginica  |
|                19 |                1.3357e-05  | non-virginica | non-virginica  |
|               119 |                0.998416    | virginica     | virginica      |
|               146 |                0.697669    | virginica     | virginica      |
|                79 |                0.227787    | non-virginica | non-virginica  |
|               128 |                0.520907    | virginica     | virginica      |
|               109 |                0.95854     | virginica     | virginica      |
|                56 |                0.262784    | non-virginica | non-virginica  |
|                31 |                1.96817e-05 | non-virginica | non-virginica  |
|                30 |                1.98174e-05 | non-virginica | non-virginica  |
|               142 |                0.585897    | virginica     | virginica      |
|               111 |                0.622702    | virginica     | virginica      |
|                20 |                8.92127e-06 | non-virginica | non-virginica  |
|               133 |                0.921539    | virginica     | virginica      |
|                65 |                0.0152908   | non-virginica | non-virginica  |
+-------------------+----------------------------+---------------+----------------+

100.00% is the prediction accuracy.
In [64]:
predict_table = individual_table(4)
calculate_accuracy(predict_table)
Evaluation Table for Model: Model with 4 feature(s)
+-------------------+----------------------------+---------------+----------------+
|   Instance Number |   Probability of Virginica | Prediction    | Ground Truth   |
|-------------------+----------------------------+---------------+----------------|
|                27 |                8.96424e-06 | non-virginica | non-virginica  |
|                19 |                5.70191e-06 | non-virginica | non-virginica  |
|               119 |                0.998534    | virginica     | virginica      |
|               146 |                0.873922    | virginica     | virginica      |
|                79 |                0.207005    | non-virginica | non-virginica  |
|               128 |                0.57273     | virginica     | virginica      |
|               109 |                0.946564    | virginica     | virginica      |
|                56 |                0.17067     | non-virginica | non-virginica  |
|                31 |                7.70235e-06 | non-virginica | non-virginica  |
|                30 |                7.52261e-06 | non-virginica | non-virginica  |
|               142 |                0.820222    | virginica     | virginica      |
|               111 |                0.728198    | virginica     | virginica      |
|                20 |                4.12879e-06 | non-virginica | non-virginica  |
|               133 |                0.956114    | virginica     | virginica      |
|                65 |                0.0161961   | non-virginica | non-virginica  |
+-------------------+----------------------------+---------------+----------------+

100.00% is the prediction accuracy.

summary : using accuracy¶

  • Accuracy allows you a concise summary of the forecast values in each table. The calculation of accuracy involves dividing the total number of predictions made by the number of right forecasts (right Predictions / Total forecasts). This result is then converted into a percentage for easier reading.

Plotting Decision Boundry For Feature : 1-2¶

In [65]:
def train_plot_evaluate(X_train, X_val, y_train, y_val):
    # Train and plot decision boundary for one feature
    _train_and_plot_decision_boundary(X_train[:, 0:1], y_train, X_val[:, 0:1], y_val, legend=True, line_color='red', bg_color='lightyellow')
    plt.title("Decision Boundary for One Feature")
    plt.xlabel("Sepal Length (cm)")
    plt.ylabel("Sepal Width (cm)")
   
    plt.tight_layout()
    plt.show()

    # Train and plot decision boundary for two features
    _train_and_plot_decision_boundary(X_train[:, 0:2], y_train, X_val[:, 0:2], y_val, legend=True, line_color='red', bg_color='lightcyan')
    plt.title("Decision Boundary for Two Features")
    plt.xlabel("Sepal Length (cm)")
    plt.ylabel("Sepal Width (cm)")
    plt.tight_layout()
    plt.show()

def _train_and_plot_decision_boundary(X_train, y_train, X_val, y_val, legend=False, line_color='red', bg_color='lightcyan'):
    fig, ax = plt.subplots(figsize=(8, 6))
    model = LogisticRegression()
    model.fit(X_train, y_train)
    plot_decision_regions(X_val, y_val, clf=model, legend=2, ax=ax, colors=line_color, scatter_kwargs={'alpha': 0.5}, contourf_kwargs={'alpha': 0.2, 'colors': bg_color})
    ax.set_title("Decision Boundary")
    if legend:
        handles, labels = ax.get_legend_handles_labels()
        ax.legend(handles, ["Non-Virginica", "Virginica"])

    
    del model


X_train = X_train.values
X_val = X_val.values


le = LabelEncoder()
y_train_encoded = le.fit_transform(y_train)
y_val_encoded = le.transform(y_val)


train_plot_evaluate(X_train, X_val, y_train_encoded, y_val_encoded)


del y_train_encoded, y_val_encoded

Plotting Decision Boundry For Feature : 3¶

In [66]:
def plot_decision_boundary_3d(features, labels, trained_model):
    
    trained_model.fit(features, labels)
    
    
    feature1_vals = features[:, 0]
    feature2_vals = features[:, 1]
    feature3_vals = features[:, 2]
    x_min, x_max = feature1_vals.min() - 1, feature1_vals.max() + 1
    y_min, y_max = feature2_vals.min() - 1, feature2_vals.max() + 1
    z_min, z_max = feature3_vals.min() - 1, feature3_vals.max() + 1
    xx, yy, zz = np.meshgrid(np.arange(x_min, x_max, 0.1),
                             np.arange(y_min, y_max, 0.1),
                             np.arange(z_min, z_max, 0.1))
    
    
    Z = trained_model.predict(np.c_[xx.ravel(), yy.ravel(), zz.ravel()])
    Z = Z.reshape(xx.shape)
    
    
    fig = px.scatter_3d(x=feature1_vals, y=feature2_vals, z=feature3_vals, color=labels)
    
    fig.add_trace(go.Surface(x=xx.squeeze(), y=yy.squeeze(), z=zz.squeeze(), 
                             surfacecolor=Z.squeeze(), colorscale='turbo', 
                             showscale=False))
    
    
    coef = trained_model.coef_.squeeze()
    intercept = trained_model.intercept_
    x_plane = np.linspace(x_min, x_max, 10)
    y_plane = np.linspace(y_min, y_max, 10)
    xx_plane, yy_plane = np.meshgrid(x_plane, y_plane)
    z_plane = (-coef[0] * xx_plane - coef[1] * yy_plane - intercept) / coef[2]
    fig.add_trace(go.Surface(x=xx_plane, y=yy_plane, z=z_plane,
                             opacity=0.5, showscale=False))
    
    fig.update_layout(scene=dict(
            xaxis_title='petal length (cm)',
            yaxis_title='sepal length (cm)',
            zaxis_title='sepal width (cm)'),
            title='Decision Boundary for 3 Features')
    fig.show()

    del trained_model


label_encoder = LabelEncoder()
y_val_encoded = label_encoder.fit_transform(y_val)


X_val_np = X_val

logistic_regression_model = LogisticRegression()
logistic_regression_model.fit(X_val_np[:, :3], y_val_encoded)

plot_decision_boundary_3d(X_val_np[:, :3], y_val_encoded, logistic_regression_model)

del logistic_regression_model

Failure modes:¶

In [67]:
def analyze_failure_modes(model, X_val, y_val):
    # Make predictions on the validation set
    y_pred = model.predict(X_val)
    
    # Extract instances where the model makes incorrect predictions
    incorrect_indices = (y_pred != y_val)
    incorrect_X = X_val[incorrect_indices]
    incorrect_y_pred = y_pred[incorrect_indices]
    incorrect_y_val = y_val[incorrect_indices]
    
    # Create DataFrame to store incorrect predictions
    incorrect_predictions = pd.DataFrame(data=incorrect_X, columns=iris.feature_names[:X_val.shape[1]])
    incorrect_predictions['Predicted Class'] = incorrect_y_pred
    incorrect_predictions['Ground Truth'] = incorrect_y_val
    del y_pred
    return incorrect_predictions

# Define a function to train models with different numbers of features and analyze failure modes
def analyze_failure_modes_for_models(X_train, y_train, X_val, y_val):
    # Store failure modes for each model
    failure_modes = {}
    
    for num_features in range(1, 5):
        # Train the model
        model = LogisticRegression()
        model.fit(X_train[:, :num_features], y_train)
        
        # Analyze failure modes
        failure_modes[f'Model with {num_features} feature(s)'] = analyze_failure_modes(model, X_val[:, :num_features], y_val)

        del model

    return failure_modes

# Get failure modes for each model
failure_modes = analyze_failure_modes_for_models(X_train, y_train, X_val, y_val)

# Print failure modes for each model
for model_name, failure_mode_data in failure_modes.items():
    print(f"Failure Modes for {model_name}:")
    if not failure_mode_data.empty:
        print(failure_mode_data)
    else:
        print("No incorrect predictions.")
    print("\n")
Failure Modes for Model with 1 feature(s):
   sepal length (cm) Predicted Class Ground Truth
0                6.1   non-virginica          NaN


Failure Modes for Model with 2 feature(s):
   sepal length (cm)  sepal width (cm) Predicted Class Ground Truth
0                6.1               3.0   non-virginica          NaN


Failure Modes for Model with 3 feature(s):
No incorrect predictions.


Failure Modes for Model with 4 feature(s):
No incorrect predictions.


Reason for worng prediction is low probabilty or low test accuracy are:¶

Model with 1 feature(s)

Instances where the model is wrong:

  • Instance Number 128: Predicted as non-virginica, but the ground truth is virginica.
  • Failure pattern: The model tends to misclassify instances where only one feature (sepal length) is considered. It might struggle when distinguishing between classes based solely on sepal length.

Model with 2 feature(s)

Instances where the model is wrong:

  • Instance Number 128: Predicted as non-virginica, but the ground truth is virginica.
  • Failure pattern: Similar to the model with 1 feature, it also misclassifies Instance Number 128, which could indicate that adding sepal width as an additional feature doesn't significantly improve the model's ability to distinguish between classes.

Model with 3 feature(s)

Instances where the model is wrong:

  • None.
  • Failure pattern: This model achieves perfect accuracy on the validation set, suggesting that considering sepal length, sepal width, and petal length together enables it to accurately classify all instances.

Model with 4 feature(s)

Instances where the model is wrong:

  • None.
  • Failure pattern: Similar to the model with 3 features, this model also achieves perfect accuracy, indicating that considering all four features together leads to accurate classification.

Overall

  • We observe that as the number of features considered by the model increases, the accuracy also improves. However, when fewer features are considered, such as in the models with 1 or 2 features, the models tend to struggle, especially in distinguishing between certain classes. This suggests that certain features alone might not be sufficient to accurately classify instances, but combining multiple features improves the model's performance.

Best Model for prediction :¶

In [68]:
# Import necessary libraries
from tabulate import tabulate

# Evaluation results for each model
evaluation_results = {
    "Model with 1 feature(s)": {
        "Validation Accuracy": 93.33,
        "Test Accuracy": 93.33,
    
    },
    "Model with 2 feature(s)": {
        "Validation Accuracy": 93.33,
        "Test Accuracy": 86.67,
        "Evaluation Table": [
            # Evaluation table data for model with 2 features
        ]
    },
    "Model with 3 feature(s)": {
        "Validation Accuracy": 100,
        "Test Accuracy": 100,
        "Evaluation Table": [
            # Evaluation table data for model with 3 features
        ]
    },
    "Model with 4 feature(s)": {
        "Validation Accuracy": 100,
        "Test Accuracy": 100,
        "Evaluation Table": [
            # Evaluation table data for model with 4 features
        ]
    }
}

# Choose the best model (Model with 3 feature(s))
best_model = "Model with 3 feature(s)"

# Summarize the results of the best model on the test set
best_model_summary = f"Summary of results for the best model ({best_model}):\n"
best_model_summary += f"Validation Accuracy: {evaluation_results[best_model]['Validation Accuracy']}%\n"
best_model_summary += f"Test Accuracy: {evaluation_results[best_model]['Test Accuracy']}%\n\n"

# Display the evaluation table for the best model
evaluation_table = evaluation_results[best_model]["Evaluation Table"]
table_headers = ["Instance Number", "Probability of Virginica", "Prediction", "Ground Truth"]
evaluation_table_str = tabulate(evaluation_table, headers=table_headers, tablefmt="grid")

# Print the summary and evaluation table
print(best_model_summary)
Summary of results for the best model (Model with 3 feature(s)):
Validation Accuracy: 100%
Test Accuracy: 100%


Reason for selecting this model as best model :

Performance Metrics:

  • Validation Accuracy: Achieving 100% accuracy on the validation set indicates flawless performance in classifying iris flowers.
  • Test Accuracy: Assuming it aligns with the validation accuracy, achieving 100% accuracy on the test set further validates the model's effectiveness.

Interpretability:

  • Simplicity: With only three features, the model remains interpretable and easy to understand.
  • Feature Importance: Identifying the importance of features like sepal length, sepal width, and petal length aids in understanding which factors drive classification decisions.

all in all, this model with three features excels due to its perfect accuracy, simplicity, interpretability, and robust generalization, making it the optimal choice for the classification task